Developing Corpus of Lecture Utterances Aligned to Slide Components

نویسندگان

  • Ryo Minamiguchi
  • Masatoshi Tsuchiya
چکیده

The approach which formulates the automatic text summarization as a maximum coverage problem with knapsack constraint over a set of textual units and a set of weighted conceptual units is promising. However, it is quite important and difficult to determine the appropriate granularity of conceptual units for this formulation. In order to resolve this problem, we are examining to use components of presentation slides as conceptual units to generate a summary of lecture utterances, instead of other possible conceptual units like base noun phrases or important nouns. This paper explains our developing corpus designed to evaluate our proposing approach, which consists of presentation slides and lecture utterances aligned to presentation slide components.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Alignment Between Classroom Lecture Utterances and Slide Components

Multimodal alignment between classroom lecture utterances and lecture slide components is one of the crucial problems to realize a multimodal e-Learning application. This paper proposes the new method for the automatic alignment, and formulates the alignment as the integer linear programming (ILP) problem to maximize the score function which consists of three factors: the similarity score betwe...

متن کامل

A Korean Spoken Document Retrieval System for Lecture Search

In this paper, we introduced a Korean spoken document retrieval system for lecture search. We automatically build a general inverted index table from spoken document transcriptions, and we extract additional information from textbooks or slide notes related to the lecture. We integrate these two sources for a search process. The speech corpus used in our system is from a highschool mathematics ...

متن کامل

The Negochat Corpus of Human-agent Negotiation Dialogues

Annotated in-domain corpora are crucial to the successful development of dialogue systems of automated agents, and in particular for developing natural language understanding (NLU) components of such systems. Unfortunately, such important resources are scarce. In this work, we introduce an annotated natural language human-agent dialogue corpus in the negotiation domain. The corpus was collected...

متن کامل

Beyond next Slide, Please": the Use of Content and Speech in Multi-modal Control

The Intelligent Classroom is an automated lecture facility where one of the primary goals is that speakers be able to control it by interacting with it as they would with a human A/V technician. In this paper we describe our research in imbedding Microsoft Powerpoint into the Intelligent Classroom. In particular we discuss how we use two modes of sensing (Computer Vision and Speech Recognition)...

متن کامل

Supervised Spoken Document Summarization jointly Considering Utterance Importance and Redundancy by Structured Support Vector Machine

In extractive spoken document summarization, it is desired to select important utterances from documents to construct the summary while avoiding redundancy among the selected utterances, but it is not easy to balance the two different goals. In this paper, a supervised spoken document summarization approach is proposed based on structured support vector machine (SVM), in which the above two goa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016